CONTEXT: The Tanzanian tourism sector plays a significant role in the Tanzanian economy, contributing about 17% to the country’s GDP and 25% of all foreign exchange revenues. The sector, which provides direct employment for more than 600,000 people and up to 2 million people indirectly, generated approximately $2.4 billion in 2018 according to government statistics. Tanzania received a record 1.1 million international visitor arrivals in 2014, mostly from Europe, the US and Africa. Tanzania is the only country in the world which has allocated more than 25% of its total area for wildlife, national parks, and protected areas.There are 16 national parks in Tanzania, 28 game reserves, 44 game-controlled areas, two marine parks and one conservation area.
AIM: The aim of this project is to explore and build a linear regression model that will predict the spending behaivior of tourists visiting Tanzania.The model can be used by different tour operators and the Tanzania Tourism Board to automatically help tourists across the world estimate their expenditure before visiting Tanzania.
#IMPORTING NECESSARY LIBRARIES
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
#Importing dataset for the analysis.
Tz = pd.read_csv("Train .csv")
Tz
| ID | country | age_group | travel_with | total_female | total_male | purpose | main_activity | info_source | tour_arrangement | ... | package_transport_tz | package_sightseeing | package_guided_tour | package_insurance | night_mainland | night_zanzibar | payment_mode | first_trip_tz | most_impressing | total_cost | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | tour_0 | SWIZERLAND | 45-64 | Friends/Relatives | 1.0 | 1.0 | Leisure and Holidays | Wildlife tourism | Friends, relatives | Independent | ... | No | No | No | No | 13.0 | 0.0 | Cash | No | Friendly People | 674602.5 |
| 1 | tour_10 | UNITED KINGDOM | 25-44 | NaN | 1.0 | 0.0 | Leisure and Holidays | Cultural tourism | others | Independent | ... | No | No | No | No | 14.0 | 7.0 | Cash | Yes | Wonderful Country, Landscape, Nature | 3214906.5 |
| 2 | tour_1000 | UNITED KINGDOM | 25-44 | Alone | 0.0 | 1.0 | Visiting Friends and Relatives | Cultural tourism | Friends, relatives | Independent | ... | No | No | No | No | 1.0 | 31.0 | Cash | No | Excellent Experience | 3315000.0 |
| 3 | tour_1002 | UNITED KINGDOM | 25-44 | Spouse | 1.0 | 1.0 | Leisure and Holidays | Wildlife tourism | Travel, agent, tour operator | Package Tour | ... | Yes | Yes | Yes | No | 11.0 | 0.0 | Cash | Yes | Friendly People | 7790250.0 |
| 4 | tour_1004 | CHINA | 1-24 | NaN | 1.0 | 0.0 | Leisure and Holidays | Wildlife tourism | Travel, agent, tour operator | Independent | ... | No | No | No | No | 7.0 | 4.0 | Cash | Yes | No comments | 1657500.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 4804 | tour_993 | UAE | 45-64 | Alone | 0.0 | 1.0 | Business | Hunting tourism | Friends, relatives | Independent | ... | No | No | No | No | 2.0 | 0.0 | Credit Card | No | No comments | 3315000.0 |
| 4805 | tour_994 | UNITED STATES OF AMERICA | 25-44 | Spouse | 1.0 | 1.0 | Leisure and Holidays | Wildlife tourism | Travel, agent, tour operator | Package Tour | ... | Yes | Yes | Yes | Yes | 11.0 | 0.0 | Cash | Yes | Friendly People | 10690875.0 |
| 4806 | tour_995 | NETHERLANDS | 1-24 | NaN | 1.0 | 0.0 | Leisure and Holidays | Wildlife tourism | others | Independent | ... | No | No | No | No | 3.0 | 7.0 | Cash | Yes | Good service | 2246636.7 |
| 4807 | tour_997 | SOUTH AFRICA | 25-44 | Friends/Relatives | 1.0 | 1.0 | Business | Beach tourism | Travel, agent, tour operator | Independent | ... | No | No | No | No | 5.0 | 0.0 | Credit Card | No | Friendly People | 1160250.0 |
| 4808 | tour_999 | UNITED KINGDOM | 25-44 | Spouse | 1.0 | 1.0 | Leisure and Holidays | Wildlife tourism | Travel, agent, tour operator | Package Tour | ... | Yes | Yes | Yes | No | 4.0 | 7.0 | Cash | Yes | Friendly People | 13260000.0 |
4809 rows × 23 columns
#Showing the top most records
Tz.head(20)
| ID | country | age_group | travel_with | total_female | total_male | purpose | main_activity | info_source | tour_arrangement | ... | package_transport_tz | package_sightseeing | package_guided_tour | package_insurance | night_mainland | night_zanzibar | payment_mode | first_trip_tz | most_impressing | total_cost | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | tour_0 | SWIZERLAND | 45-64 | Friends/Relatives | 1.0 | 1.0 | Leisure and Holidays | Wildlife tourism | Friends, relatives | Independent | ... | No | No | No | No | 13.0 | 0.0 | Cash | No | Friendly People | 674602.5 |
| 1 | tour_10 | UNITED KINGDOM | 25-44 | NaN | 1.0 | 0.0 | Leisure and Holidays | Cultural tourism | others | Independent | ... | No | No | No | No | 14.0 | 7.0 | Cash | Yes | Wonderful Country, Landscape, Nature | 3214906.5 |
| 2 | tour_1000 | UNITED KINGDOM | 25-44 | Alone | 0.0 | 1.0 | Visiting Friends and Relatives | Cultural tourism | Friends, relatives | Independent | ... | No | No | No | No | 1.0 | 31.0 | Cash | No | Excellent Experience | 3315000.0 |
| 3 | tour_1002 | UNITED KINGDOM | 25-44 | Spouse | 1.0 | 1.0 | Leisure and Holidays | Wildlife tourism | Travel, agent, tour operator | Package Tour | ... | Yes | Yes | Yes | No | 11.0 | 0.0 | Cash | Yes | Friendly People | 7790250.0 |
| 4 | tour_1004 | CHINA | 1-24 | NaN | 1.0 | 0.0 | Leisure and Holidays | Wildlife tourism | Travel, agent, tour operator | Independent | ... | No | No | No | No | 7.0 | 4.0 | Cash | Yes | No comments | 1657500.0 |
| 5 | tour_1005 | UNITED KINGDOM | 25-44 | NaN | 0.0 | 1.0 | Leisure and Holidays | Wildlife tourism | Travel, agent, tour operator | Package Tour | ... | No | Yes | Yes | No | 9.0 | 3.0 | Cash | Yes | Wildlife | 120950.0 |
| 6 | tour_1007 | SOUTH AFRICA | 45-64 | Alone | 0.0 | 1.0 | Business | Mountain climbing | Friends, relatives | Independent | ... | No | No | No | No | 9.0 | 0.0 | Cash | Yes | Friendly People | 466140.0 |
| 7 | tour_1008 | UNITED STATES OF AMERICA | 45-64 | Friends/Relatives | 1.0 | 1.0 | Leisure and Holidays | Wildlife tourism | Travel, agent, tour operator | Package Tour | ... | Yes | Yes | Yes | Yes | 10.0 | 3.0 | Cash | Yes | Friendly People | 3480750.0 |
| 8 | tour_101 | NIGERIA | 25-44 | Alone | 0.0 | 1.0 | Leisure and Holidays | Cultural tourism | Travel, agent, tour operator | Independent | ... | No | No | No | No | 4.0 | 0.0 | Cash | Yes | NaN | 994500.0 |
| 9 | tour_1011 | INDIA | 25-44 | Alone | 1.0 | 0.0 | Business | Wildlife tourism | Travel, agent, tour operator | Independent | ... | No | No | No | No | 5.0 | 0.0 | Credit Card | Yes | Friendly People | 2486250.0 |
| 10 | tour_1012 | BRAZIL | 25-44 | Spouse | 1.0 | 1.0 | Leisure and Holidays | Wildlife tourism | Radio, TV, Web | Independent | ... | No | No | No | No | 17.0 | 3.0 | Cash | Yes | Wonderful Country, Landscape, Nature | 1117155.0 |
| 11 | tour_1013 | CANADA | 45-64 | Children | 2.0 | 0.0 | Leisure and Holidays | Beach tourism | Friends, relatives | Independent | ... | No | No | No | No | 30.0 | 0.0 | Cash | No | Excellent Experience | 8121750.0 |
| 12 | tour_1016 | CANADA | 45-64 | Children | 0.0 | 2.0 | Leisure and Holidays | Wildlife tourism | Travel, agent, tour operator | Independent | ... | No | No | No | No | 11.0 | 3.0 | Cash | Yes | No comments | 331500.0 |
| 13 | tour_1017 | MALT | 25-44 | Friends/Relatives | 2.0 | 0.0 | Leisure and Holidays | Wildlife tourism | Friends, relatives | Package Tour | ... | Yes | No | No | No | 10.0 | 0.0 | Cash | Yes | No comments | 11346650.0 |
| 14 | tour_1018 | MOZAMBIQUE | 25-44 | Alone | 0.0 | 1.0 | Visiting Friends and Relatives | Beach tourism | Friends, relatives | Independent | ... | No | No | No | No | 2.0 | 0.0 | Cash | Yes | Wildlife | 497250.0 |
| 15 | tour_102 | RWANDA | 65+ | Alone | 1.0 | 0.0 | Leisure and Holidays | Beach tourism | Friends, relatives | Independent | ... | No | No | No | No | 0.0 | 2.0 | Cash | Yes | Wonderful Country, Landscape, Nature | 331500.0 |
| 16 | tour_1021 | AUSTRIA | 45-64 | Friends/Relatives | 4.0 | 1.0 | Visiting Friends and Relatives | Mountain climbing | Friends, relatives | Independent | ... | No | No | No | No | 24.0 | 0.0 | Cash | No | Friendly People | 2000000.0 |
| 17 | tour_1022 | MYANMAR | 25-44 | NaN | 1.0 | 0.0 | Meetings and Conference | Wildlife tourism | Radio, TV, Web | Independent | ... | No | No | No | No | 5.0 | 0.0 | Cash | Yes | Friendly People | 331500.0 |
| 18 | tour_1024 | GERMANY | 25-44 | Children | 1.0 | 1.0 | Visiting Friends and Relatives | Cultural tourism | Friends, relatives | Independent | ... | No | No | No | No | 3.0 | 0.0 | Cash | Yes | Friendly People | 2269330.0 |
| 19 | tour_1026 | KENYA | 25-44 | NaN | 1.0 | 0.0 | Business | Mountain climbing | Friends, relatives | Independent | ... | No | No | No | No | 4.0 | 0.0 | Cash | No | Friendly People | 377520.0 |
20 rows × 23 columns
#Information about the tabular data.
Tz.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 4809 entries, 0 to 4808 Data columns (total 23 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 ID 4809 non-null object 1 country 4809 non-null object 2 age_group 4809 non-null object 3 travel_with 3695 non-null object 4 total_female 4806 non-null float64 5 total_male 4804 non-null float64 6 purpose 4809 non-null object 7 main_activity 4809 non-null object 8 info_source 4809 non-null object 9 tour_arrangement 4809 non-null object 10 package_transport_int 4809 non-null object 11 package_accomodation 4809 non-null object 12 package_food 4809 non-null object 13 package_transport_tz 4809 non-null object 14 package_sightseeing 4809 non-null object 15 package_guided_tour 4809 non-null object 16 package_insurance 4809 non-null object 17 night_mainland 4809 non-null float64 18 night_zanzibar 4809 non-null float64 19 payment_mode 4809 non-null object 20 first_trip_tz 4809 non-null object 21 most_impressing 4496 non-null object 22 total_cost 4809 non-null float64 dtypes: float64(5), object(18) memory usage: 864.2+ KB
#Statistical description of numerical features.
Tz.describe()
| total_female | total_male | night_mainland | night_zanzibar | total_cost | |
|---|---|---|---|---|---|
| count | 4806.000000 | 4804.000000 | 4809.000000 | 4809.000000 | 4.809000e+03 |
| mean | 0.926758 | 1.009575 | 8.488043 | 2.304429 | 8.114389e+06 |
| std | 1.288242 | 1.138865 | 10.427624 | 4.227080 | 1.222490e+07 |
| min | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 4.900000e+04 |
| 25% | 0.000000 | 1.000000 | 3.000000 | 0.000000 | 8.121750e+05 |
| 50% | 1.000000 | 1.000000 | 6.000000 | 0.000000 | 3.397875e+06 |
| 75% | 1.000000 | 1.000000 | 11.000000 | 4.000000 | 9.945000e+06 |
| max | 49.000000 | 44.000000 | 145.000000 | 61.000000 | 9.953288e+07 |
Tz.isnull().sum()
ID 0 country 0 age_group 0 travel_with 1114 total_female 3 total_male 5 purpose 0 main_activity 0 info_source 0 tour_arrangement 0 package_transport_int 0 package_accomodation 0 package_food 0 package_transport_tz 0 package_sightseeing 0 package_guided_tour 0 package_insurance 0 night_mainland 0 night_zanzibar 0 payment_mode 0 first_trip_tz 0 most_impressing 313 total_cost 0 dtype: int64
travel_with, total-female, total male and most_impressing columns have NaN/Null values which needs to be filled.
#Travel_with
Tz['travel_with'].fillna('Alone',inplace=True)
Tz
| ID | country | age_group | travel_with | total_female | total_male | purpose | main_activity | info_source | tour_arrangement | ... | package_transport_tz | package_sightseeing | package_guided_tour | package_insurance | night_mainland | night_zanzibar | payment_mode | first_trip_tz | most_impressing | total_cost | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | tour_0 | SWIZERLAND | 45-64 | Friends/Relatives | 1.0 | 1.0 | Leisure and Holidays | Wildlife tourism | Friends, relatives | Independent | ... | No | No | No | No | 13.0 | 0.0 | Cash | No | Friendly People | 674602.5 |
| 1 | tour_10 | UNITED KINGDOM | 25-44 | Alone | 1.0 | 0.0 | Leisure and Holidays | Cultural tourism | others | Independent | ... | No | No | No | No | 14.0 | 7.0 | Cash | Yes | Wonderful Country, Landscape, Nature | 3214906.5 |
| 2 | tour_1000 | UNITED KINGDOM | 25-44 | Alone | 0.0 | 1.0 | Visiting Friends and Relatives | Cultural tourism | Friends, relatives | Independent | ... | No | No | No | No | 1.0 | 31.0 | Cash | No | Excellent Experience | 3315000.0 |
| 3 | tour_1002 | UNITED KINGDOM | 25-44 | Spouse | 1.0 | 1.0 | Leisure and Holidays | Wildlife tourism | Travel, agent, tour operator | Package Tour | ... | Yes | Yes | Yes | No | 11.0 | 0.0 | Cash | Yes | Friendly People | 7790250.0 |
| 4 | tour_1004 | CHINA | 1-24 | Alone | 1.0 | 0.0 | Leisure and Holidays | Wildlife tourism | Travel, agent, tour operator | Independent | ... | No | No | No | No | 7.0 | 4.0 | Cash | Yes | No comments | 1657500.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 4804 | tour_993 | UAE | 45-64 | Alone | 0.0 | 1.0 | Business | Hunting tourism | Friends, relatives | Independent | ... | No | No | No | No | 2.0 | 0.0 | Credit Card | No | No comments | 3315000.0 |
| 4805 | tour_994 | UNITED STATES OF AMERICA | 25-44 | Spouse | 1.0 | 1.0 | Leisure and Holidays | Wildlife tourism | Travel, agent, tour operator | Package Tour | ... | Yes | Yes | Yes | Yes | 11.0 | 0.0 | Cash | Yes | Friendly People | 10690875.0 |
| 4806 | tour_995 | NETHERLANDS | 1-24 | Alone | 1.0 | 0.0 | Leisure and Holidays | Wildlife tourism | others | Independent | ... | No | No | No | No | 3.0 | 7.0 | Cash | Yes | Good service | 2246636.7 |
| 4807 | tour_997 | SOUTH AFRICA | 25-44 | Friends/Relatives | 1.0 | 1.0 | Business | Beach tourism | Travel, agent, tour operator | Independent | ... | No | No | No | No | 5.0 | 0.0 | Credit Card | No | Friendly People | 1160250.0 |
| 4808 | tour_999 | UNITED KINGDOM | 25-44 | Spouse | 1.0 | 1.0 | Leisure and Holidays | Wildlife tourism | Travel, agent, tour operator | Package Tour | ... | Yes | Yes | Yes | No | 4.0 | 7.0 | Cash | Yes | Friendly People | 13260000.0 |
4809 rows × 23 columns
#Most_impressing
Tz['most_impressing'].fillna('Friendly People',inplace=True)
Tz
| ID | country | age_group | travel_with | total_female | total_male | purpose | main_activity | info_source | tour_arrangement | ... | package_transport_tz | package_sightseeing | package_guided_tour | package_insurance | night_mainland | night_zanzibar | payment_mode | first_trip_tz | most_impressing | total_cost | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | tour_0 | SWIZERLAND | 45-64 | Friends/Relatives | 1.0 | 1.0 | Leisure and Holidays | Wildlife tourism | Friends, relatives | Independent | ... | No | No | No | No | 13.0 | 0.0 | Cash | No | Friendly People | 674602.5 |
| 1 | tour_10 | UNITED KINGDOM | 25-44 | Alone | 1.0 | 0.0 | Leisure and Holidays | Cultural tourism | others | Independent | ... | No | No | No | No | 14.0 | 7.0 | Cash | Yes | Wonderful Country, Landscape, Nature | 3214906.5 |
| 2 | tour_1000 | UNITED KINGDOM | 25-44 | Alone | 0.0 | 1.0 | Visiting Friends and Relatives | Cultural tourism | Friends, relatives | Independent | ... | No | No | No | No | 1.0 | 31.0 | Cash | No | Excellent Experience | 3315000.0 |
| 3 | tour_1002 | UNITED KINGDOM | 25-44 | Spouse | 1.0 | 1.0 | Leisure and Holidays | Wildlife tourism | Travel, agent, tour operator | Package Tour | ... | Yes | Yes | Yes | No | 11.0 | 0.0 | Cash | Yes | Friendly People | 7790250.0 |
| 4 | tour_1004 | CHINA | 1-24 | Alone | 1.0 | 0.0 | Leisure and Holidays | Wildlife tourism | Travel, agent, tour operator | Independent | ... | No | No | No | No | 7.0 | 4.0 | Cash | Yes | No comments | 1657500.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 4804 | tour_993 | UAE | 45-64 | Alone | 0.0 | 1.0 | Business | Hunting tourism | Friends, relatives | Independent | ... | No | No | No | No | 2.0 | 0.0 | Credit Card | No | No comments | 3315000.0 |
| 4805 | tour_994 | UNITED STATES OF AMERICA | 25-44 | Spouse | 1.0 | 1.0 | Leisure and Holidays | Wildlife tourism | Travel, agent, tour operator | Package Tour | ... | Yes | Yes | Yes | Yes | 11.0 | 0.0 | Cash | Yes | Friendly People | 10690875.0 |
| 4806 | tour_995 | NETHERLANDS | 1-24 | Alone | 1.0 | 0.0 | Leisure and Holidays | Wildlife tourism | others | Independent | ... | No | No | No | No | 3.0 | 7.0 | Cash | Yes | Good service | 2246636.7 |
| 4807 | tour_997 | SOUTH AFRICA | 25-44 | Friends/Relatives | 1.0 | 1.0 | Business | Beach tourism | Travel, agent, tour operator | Independent | ... | No | No | No | No | 5.0 | 0.0 | Credit Card | No | Friendly People | 1160250.0 |
| 4808 | tour_999 | UNITED KINGDOM | 25-44 | Spouse | 1.0 | 1.0 | Leisure and Holidays | Wildlife tourism | Travel, agent, tour operator | Package Tour | ... | Yes | Yes | Yes | No | 4.0 | 7.0 | Cash | Yes | Friendly People | 13260000.0 |
4809 rows × 23 columns
#total_male
Tz['total_male'].fillna(method='bfill',inplace=True)
Tz
| ID | country | age_group | travel_with | total_female | total_male | purpose | main_activity | info_source | tour_arrangement | ... | package_transport_tz | package_sightseeing | package_guided_tour | package_insurance | night_mainland | night_zanzibar | payment_mode | first_trip_tz | most_impressing | total_cost | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | tour_0 | SWIZERLAND | 45-64 | Friends/Relatives | 1.0 | 1.0 | Leisure and Holidays | Wildlife tourism | Friends, relatives | Independent | ... | No | No | No | No | 13.0 | 0.0 | Cash | No | Friendly People | 674602.5 |
| 1 | tour_10 | UNITED KINGDOM | 25-44 | Alone | 1.0 | 0.0 | Leisure and Holidays | Cultural tourism | others | Independent | ... | No | No | No | No | 14.0 | 7.0 | Cash | Yes | Wonderful Country, Landscape, Nature | 3214906.5 |
| 2 | tour_1000 | UNITED KINGDOM | 25-44 | Alone | 0.0 | 1.0 | Visiting Friends and Relatives | Cultural tourism | Friends, relatives | Independent | ... | No | No | No | No | 1.0 | 31.0 | Cash | No | Excellent Experience | 3315000.0 |
| 3 | tour_1002 | UNITED KINGDOM | 25-44 | Spouse | 1.0 | 1.0 | Leisure and Holidays | Wildlife tourism | Travel, agent, tour operator | Package Tour | ... | Yes | Yes | Yes | No | 11.0 | 0.0 | Cash | Yes | Friendly People | 7790250.0 |
| 4 | tour_1004 | CHINA | 1-24 | Alone | 1.0 | 0.0 | Leisure and Holidays | Wildlife tourism | Travel, agent, tour operator | Independent | ... | No | No | No | No | 7.0 | 4.0 | Cash | Yes | No comments | 1657500.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 4804 | tour_993 | UAE | 45-64 | Alone | 0.0 | 1.0 | Business | Hunting tourism | Friends, relatives | Independent | ... | No | No | No | No | 2.0 | 0.0 | Credit Card | No | No comments | 3315000.0 |
| 4805 | tour_994 | UNITED STATES OF AMERICA | 25-44 | Spouse | 1.0 | 1.0 | Leisure and Holidays | Wildlife tourism | Travel, agent, tour operator | Package Tour | ... | Yes | Yes | Yes | Yes | 11.0 | 0.0 | Cash | Yes | Friendly People | 10690875.0 |
| 4806 | tour_995 | NETHERLANDS | 1-24 | Alone | 1.0 | 0.0 | Leisure and Holidays | Wildlife tourism | others | Independent | ... | No | No | No | No | 3.0 | 7.0 | Cash | Yes | Good service | 2246636.7 |
| 4807 | tour_997 | SOUTH AFRICA | 25-44 | Friends/Relatives | 1.0 | 1.0 | Business | Beach tourism | Travel, agent, tour operator | Independent | ... | No | No | No | No | 5.0 | 0.0 | Credit Card | No | Friendly People | 1160250.0 |
| 4808 | tour_999 | UNITED KINGDOM | 25-44 | Spouse | 1.0 | 1.0 | Leisure and Holidays | Wildlife tourism | Travel, agent, tour operator | Package Tour | ... | Yes | Yes | Yes | No | 4.0 | 7.0 | Cash | Yes | Friendly People | 13260000.0 |
4809 rows × 23 columns
#total_female
Tz['total_female'].fillna(method='bfill',inplace=True)
Tz
| ID | country | age_group | travel_with | total_female | total_male | purpose | main_activity | info_source | tour_arrangement | ... | package_transport_tz | package_sightseeing | package_guided_tour | package_insurance | night_mainland | night_zanzibar | payment_mode | first_trip_tz | most_impressing | total_cost | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | tour_0 | SWIZERLAND | 45-64 | Friends/Relatives | 1.0 | 1.0 | Leisure and Holidays | Wildlife tourism | Friends, relatives | Independent | ... | No | No | No | No | 13.0 | 0.0 | Cash | No | Friendly People | 674602.5 |
| 1 | tour_10 | UNITED KINGDOM | 25-44 | Alone | 1.0 | 0.0 | Leisure and Holidays | Cultural tourism | others | Independent | ... | No | No | No | No | 14.0 | 7.0 | Cash | Yes | Wonderful Country, Landscape, Nature | 3214906.5 |
| 2 | tour_1000 | UNITED KINGDOM | 25-44 | Alone | 0.0 | 1.0 | Visiting Friends and Relatives | Cultural tourism | Friends, relatives | Independent | ... | No | No | No | No | 1.0 | 31.0 | Cash | No | Excellent Experience | 3315000.0 |
| 3 | tour_1002 | UNITED KINGDOM | 25-44 | Spouse | 1.0 | 1.0 | Leisure and Holidays | Wildlife tourism | Travel, agent, tour operator | Package Tour | ... | Yes | Yes | Yes | No | 11.0 | 0.0 | Cash | Yes | Friendly People | 7790250.0 |
| 4 | tour_1004 | CHINA | 1-24 | Alone | 1.0 | 0.0 | Leisure and Holidays | Wildlife tourism | Travel, agent, tour operator | Independent | ... | No | No | No | No | 7.0 | 4.0 | Cash | Yes | No comments | 1657500.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 4804 | tour_993 | UAE | 45-64 | Alone | 0.0 | 1.0 | Business | Hunting tourism | Friends, relatives | Independent | ... | No | No | No | No | 2.0 | 0.0 | Credit Card | No | No comments | 3315000.0 |
| 4805 | tour_994 | UNITED STATES OF AMERICA | 25-44 | Spouse | 1.0 | 1.0 | Leisure and Holidays | Wildlife tourism | Travel, agent, tour operator | Package Tour | ... | Yes | Yes | Yes | Yes | 11.0 | 0.0 | Cash | Yes | Friendly People | 10690875.0 |
| 4806 | tour_995 | NETHERLANDS | 1-24 | Alone | 1.0 | 0.0 | Leisure and Holidays | Wildlife tourism | others | Independent | ... | No | No | No | No | 3.0 | 7.0 | Cash | Yes | Good service | 2246636.7 |
| 4807 | tour_997 | SOUTH AFRICA | 25-44 | Friends/Relatives | 1.0 | 1.0 | Business | Beach tourism | Travel, agent, tour operator | Independent | ... | No | No | No | No | 5.0 | 0.0 | Credit Card | No | Friendly People | 1160250.0 |
| 4808 | tour_999 | UNITED KINGDOM | 25-44 | Spouse | 1.0 | 1.0 | Leisure and Holidays | Wildlife tourism | Travel, agent, tour operator | Package Tour | ... | Yes | Yes | Yes | No | 4.0 | 7.0 | Cash | Yes | Friendly People | 13260000.0 |
4809 rows × 23 columns
#Confirming if there still exist a null values.
Tz.isnull().sum()
ID 0 country 0 age_group 0 travel_with 0 total_female 0 total_male 0 purpose 0 main_activity 0 info_source 0 tour_arrangement 0 package_transport_int 0 package_accomodation 0 package_food 0 package_transport_tz 0 package_sightseeing 0 package_guided_tour 0 package_insurance 0 night_mainland 0 night_zanzibar 0 payment_mode 0 first_trip_tz 0 most_impressing 0 total_cost 0 dtype: int64
sns.pairplot(Tz)
<seaborn.axisgrid.PairGrid at 0x2430913bf70>
Using the pairplotting above inorder to form some simple classification models by drawing some simple lines or make linear separation in our data-set to form a linear regression model.
Tz.plot.scatter(x='total_cost',y='night_mainland')
<AxesSubplot:xlabel='total_cost', ylabel='night_mainland'>
sns.lmplot(x='total_cost',y='night_mainland',data=Tz)
<seaborn.axisgrid.FacetGrid at 0x2430c0b7430>
#Chart showing the linear distribution of tourist spending.
import warnings
warnings.filterwarnings('ignore')
sns.distplot(Tz['total_cost']/10**6).set(title='SPENDING DISTIBUTION')
[Text(0.5, 1.0, 'SPENDING DISTIBUTION')]
#Data correlation of features.
Tz.corr()
| total_female | total_male | night_mainland | night_zanzibar | total_cost | |
|---|---|---|---|---|---|
| total_female | 1.000000 | 0.467000 | 0.031233 | 0.138523 | 0.285862 |
| total_male | 0.467000 | 1.000000 | -0.041369 | 0.050172 | 0.183785 |
| night_mainland | 0.031233 | -0.041369 | 1.000000 | -0.118155 | 0.020473 |
| night_zanzibar | 0.138523 | 0.050172 | -0.118155 | 1.000000 | 0.145139 |
| total_cost | 0.285862 | 0.183785 | 0.020473 | 0.145139 | 1.000000 |
sns.heatmap(Tz.corr(), annot=True)
<AxesSubplot:>
#import pandas profiling library
from pandas_profiling import ProfileReport
profile = ProfileReport(Tz)
profile
Summarize dataset: 0%| | 0/5 [00:00<?, ?it/s]
Generate report structure: 0%| | 0/1 [00:00<?, ?it/s]
Render HTML: 0%| | 0/1 [00:00<?, ?it/s]
#top 5 countries with the highest spending statistics
countries = Tz[['country', 'total_cost']].head(8).groupby('country').sum()
countries
| total_cost | |
|---|---|
| country | |
| CHINA | 1657500.0 |
| SOUTH AFRICA | 466140.0 |
| SWIZERLAND | 674602.5 |
| UNITED KINGDOM | 14441106.5 |
| UNITED STATES OF AMERICA | 3480750.0 |
#CHAT SHOWING TOP 5 COUNTRIES WITH HIGHEST SPENDING STATISTICS
plt.style.use('seaborn')
countries.plot(figsize=(12,3), color='red', legend=False, title='TOP 5 COUNTRIES WITH HIGHEST SPENDING STATISTICS')
<AxesSubplot:title={'center':'TOP 5 COUNTRIES WITH HIGHEST SPENDING STATISTICS'}, xlabel='country'>
Of all the five countries shown above, United Kingdom indicates the leading top most country in tourist spending.
#COUNTRIES WITH HIGHEST TOURIST CITIZENS VISITING TANZANIA.
Tz[['country']].value_counts()
country
UNITED STATES OF AMERICA 695
UNITED KINGDOM 533
ITALY 393
FRANCE 280
ZIMBABWE 274
...
ANGOLA 1
MONTENEGRO 1
MORROCO 1
MYANMAR 1
MADAGASCAR 1
Length: 105, dtype: int64
Tz[['country']].value_counts().head(20).plot(kind='barh', title='COUNTRIES WITH HIGHEST TOURIST CITIZENS VISITING TANZANIA')
<AxesSubplot:title={'center':'COUNTRIES WITH HIGHEST TOURIST CITIZENS VISITING TANZANIA'}, ylabel='country'>
#which age-group are the highest spenders and who are the over all highest spenders by travel with
Age = Tz[['age_group','total_cost','travel_with']]
Age
| age_group | total_cost | travel_with | |
|---|---|---|---|
| 0 | 45-64 | 674602.5 | Friends/Relatives |
| 1 | 25-44 | 3214906.5 | Alone |
| 2 | 25-44 | 3315000.0 | Alone |
| 3 | 25-44 | 7790250.0 | Spouse |
| 4 | 1-24 | 1657500.0 | Alone |
| ... | ... | ... | ... |
| 4804 | 45-64 | 3315000.0 | Alone |
| 4805 | 25-44 | 10690875.0 | Spouse |
| 4806 | 1-24 | 2246636.7 | Alone |
| 4807 | 25-44 | 1160250.0 | Friends/Relatives |
| 4808 | 25-44 | 13260000.0 | Spouse |
4809 rows × 3 columns
Age.describe(include = 'all')
| age_group | total_cost | travel_with | |
|---|---|---|---|
| count | 4809 | 4.809000e+03 | 4809 |
| unique | 4 | NaN | 5 |
| top | 25-44 | NaN | Alone |
| freq | 2487 | NaN | 2379 |
| mean | NaN | 8.114389e+06 | NaN |
| std | NaN | 1.222490e+07 | NaN |
| min | NaN | 4.900000e+04 | NaN |
| 25% | NaN | 8.121750e+05 | NaN |
| 50% | NaN | 3.397875e+06 | NaN |
| 75% | NaN | 9.945000e+06 | NaN |
| max | NaN | 9.953288e+07 | NaN |
Age.groupby('age_group').sum()/10**6
| total_cost | |
|---|---|
| age_group | |
| 1-24 | 3379.088150 |
| 25-44 | 14987.099938 |
| 45-64 | 15371.839260 |
| 65+ | 5284.068284 |
#HIGHEST SPENDING AGE GROUP
Age.groupby('age_group').sum().plot(kind='bar',title='HIGHEST SPENDING AGE GROUP')
<AxesSubplot:title={'center':'HIGHEST SPENDING AGE GROUP'}, xlabel='age_group'>
Age group 45-64 has the highest spending statistics.
Age.groupby('travel_with').sum()
| total_cost | |
|---|---|
| travel_with | |
| Alone | 8.717835e+09 |
| Children | 1.653502e+09 |
| Friends/Relatives | 9.158700e+09 |
| Spouse | 1.274631e+10 |
| Spouse and Children | 6.745753e+09 |
#Highest age_group Tourist.
Age[['age_group']].value_counts()
age_group 25-44 2487 45-64 1391 1-24 624 65+ 307 dtype: int64
Age[['age_group']].value_counts().plot(kind='pie', autopct='%1.1f%%')
<AxesSubplot:ylabel='None'>
A pie chart show age_group representation in tourism activities in Tanzania
#COUNTRY WITH THE MOST SPENDING TOURIST
Tz[['country','total_cost']].head(10)
| country | total_cost | |
|---|---|---|
| 0 | SWIZERLAND | 674602.5 |
| 1 | UNITED KINGDOM | 3214906.5 |
| 2 | UNITED KINGDOM | 3315000.0 |
| 3 | UNITED KINGDOM | 7790250.0 |
| 4 | CHINA | 1657500.0 |
| 5 | UNITED KINGDOM | 120950.0 |
| 6 | SOUTH AFRICA | 466140.0 |
| 7 | UNITED STATES OF AMERICA | 3480750.0 |
| 8 | NIGERIA | 994500.0 |
| 9 | INDIA | 2486250.0 |
Tz[['country','total_cost']].head(10).plot(kind='bar',title='COUNTRY WITH THE MOST SPENDING TOURIST')
<AxesSubplot:title={'center':'COUNTRY WITH THE MOST SPENDING TOURIST'}>
United Kindom (3) is the country with the most spending tourist even though it falls second to USA with the highest number of tourist visiting Tanzania.
#AVERAGE NUMBER OF NIGHT TOURIST'S SPENDS ON TANZANIA MAINLAND
Tz[['night_mainland']].mean()
night_mainland 8.488043 dtype: float64
An average eight (8) number of nights is spend by a tourist on Tanzania mainland
#AVERAGE NUMBER OF NIGHT TOURIST'S SPENDS ON TANZANIA ZANZIBAR
Tz[['night_zanzibar']].mean()
night_zanzibar 2.304429 dtype: float64
Two(2) nights averagely is spent on Tanzania Zanzibar by tourist.
#MOST PREFERRED PAYMENT METHOD BY TOURIST
Tz[['payment_mode']].head(30)
| payment_mode | |
|---|---|
| 0 | Cash |
| 1 | Cash |
| 2 | Cash |
| 3 | Cash |
| 4 | Cash |
| 5 | Cash |
| 6 | Cash |
| 7 | Cash |
| 8 | Cash |
| 9 | Credit Card |
| 10 | Cash |
| 11 | Cash |
| 12 | Cash |
| 13 | Cash |
| 14 | Cash |
| 15 | Cash |
| 16 | Cash |
| 17 | Cash |
| 18 | Cash |
| 19 | Cash |
| 20 | Cash |
| 21 | Cash |
| 22 | Credit Card |
| 23 | Cash |
| 24 | Cash |
| 25 | Cash |
| 26 | Credit Card |
| 27 | Cash |
| 28 | Cash |
| 29 | Cash |
type(Tz['payment_mode'].iloc[0])
str
#CHART SHOWING MOST USED PAYMENT MODE BY TOURIST
plt.style.use('ggplot')
sns.catplot(data=Tz, x='payment_mode', kind='count').set(title='PAYMENT MODE USED')
<seaborn.axisgrid.FacetGrid at 0x2431a874790>
The visualization above indicate most tourist prefered payment via cash.
#TOURISM MAIN ACTIVITIES IN TANZANIA
Tz[['main_activity']].head(10)
| main_activity | |
|---|---|
| 0 | Wildlife tourism |
| 1 | Cultural tourism |
| 2 | Cultural tourism |
| 3 | Wildlife tourism |
| 4 | Wildlife tourism |
| 5 | Wildlife tourism |
| 6 | Mountain climbing |
| 7 | Wildlife tourism |
| 8 | Cultural tourism |
| 9 | Wildlife tourism |
sns.catplot(data=Tz, y='main_activity', kind='count')
<seaborn.axisgrid.FacetGrid at 0x2431aa3d760>
The tourist are more engaged in wildlife and beach tourism activities every time they visit with Wildlife more participated.
Tz[['package_food']]
| package_food | |
|---|---|
| 0 | No |
| 1 | No |
| 2 | No |
| 3 | Yes |
| 4 | No |
| ... | ... |
| 4804 | No |
| 4805 | Yes |
| 4806 | No |
| 4807 | Yes |
| 4808 | Yes |
4809 rows × 1 columns
sns.catplot(data=Tz, x='package_food', kind='count')
<seaborn.axisgrid.FacetGrid at 0x243195d83d0>
Observation above, indicates most tourist prefer No package food during there tourist activities, this could be it's more delicious and safer or flexible in eating.
#FEATURE ENGINEERING
#Getting arrays with features to train on
Tz.columns
Index(['ID', 'country', 'age_group', 'travel_with', 'total_female',
'total_male', 'purpose', 'main_activity', 'info_source',
'tour_arrangement', 'package_transport_int', 'package_accomodation',
'package_food', 'package_transport_tz', 'package_sightseeing',
'package_guided_tour', 'package_insurance', 'night_mainland',
'night_zanzibar', 'payment_mode', 'first_trip_tz', 'most_impressing',
'total_cost'],
dtype='object')
#PREPARING DATA FOR MODELLING
#features for modelling
X = Tz[['total_female',
'total_male','night_mainland','night_zanzibar'
]]
#Target variable or Prediction variable.
Y = Tz[['total_cost']]
#Train test split
from sklearn.model_selection import train_test_split
X_test,X_train, Y_test,Y_train = train_test_split(X,Y, test_size=0.9, random_state=60)
#Creating and Training the model
from sklearn.linear_model import LinearRegression
#Instantiate model
lm = LinearRegression()
lm.fit(X_train,Y_train)
LinearRegression()
print(lm.intercept_)
[4462382.63431905]
lm.coef_
array([[2071453.64220804, 748866.97405083, 35552.52558308,
293057.24098337]])
X_train.columns
Index(['total_female', 'total_male', 'night_mainland', 'night_zanzibar'], dtype='object')
prediction = lm.predict(X_test)
#Predicted spending cost of tourist/ indicating their spending behaviours
prediction
array([[ 5317907.18511913],
[ 5602327.3897838 ],
[ 9334103.93746147],
[ 5708984.96653305],
[ 5389012.2362853 ],
[ 5957852.64561464],
[ 5246802.13395296],
[ 6676046.37885942],
[ 5424564.76186838],
[ 5246802.13395296],
[ 6102326.68475304],
[ 9582971.61654306],
[12636660.34162491],
[ 5282354.65953605],
[ 7389360.82732717],
[10458549.12266762],
[ 5460117.28745147],
[ 6410394.06156773],
[ 8778173.26187351],
[ 5353459.71070221],
[ 8458200.53162575],
[ 5424564.76186838],
[ 9582482.18211578],
[ 5246802.13395296],
[ 5531222.33861763],
[10799390.1423783 ],
[ 7496018.40407642],
[ 9334103.93746147],
[ 5682069.47726866],
[ 5744537.49211614],
[ 8988220.09825764],
[ 6137879.21033613],
[ 7780438.60874109],
[ 6960466.58352409],
[ 9103514.71132558],
[10600759.22499996],
[ 5246802.13395296],
[10316339.02033529],
[13175806.37900277],
[ 7531570.9296595 ],
[ 5566774.86420072],
[11130778.79163184],
[ 9334103.93746147],
[ 6552604.16390007],
[ 6604941.32769325],
[ 5353459.71070221],
[ 7389360.82732717],
[ 6640493.85327634],
[21589473.88347934],
[13477011.22187755],
[ 5389012.2362853 ],
[ 8410976.91084744],
[ 5389012.2362853 ],
[ 6604941.32769325],
[ 9041046.69647811],
[ 5317907.18511913],
[ 5460117.28745147],
[10924326.17207325],
[ 9716544.6825567 ],
[ 5282354.65953605],
[ 5282354.65953605],
[ 6457851.94058388],
[ 7067124.16027335],
[ 7531570.9296595 ],
[ 6747151.43002559],
[11954345.03367437],
[ 8703799.89359296],
[ 7460465.87849334],
[11385504.62434503],
[ 7496018.40407642],
[12596064.99654867],
[15850674.08726867],
[ 7709333.55757492],
[ 8529305.58279192],
[ 7031571.63469026],
[10133533.57292671],
[ 6953323.36194099],
[14173051.59770791],
[ 8846009.9959253 ],
[ 6277825.3758624 ],
[ 9760734.24445848],
[ 7673781.03199184],
[10396081.10782015],
[ 5282354.65953605],
[ 6676046.37885942],
[ 7531570.9296595 ],
[ 5246802.13395296],
[ 5424564.76186838],
[ 5246802.13395296],
[ 9041046.69647811],
[ 5460117.28745147],
[20337828.73177218],
[ 7884480.83751937],
[ 9260734.94948924],
[ 5460117.28745147],
[11888282.80200136],
[ 9674129.62303371],
[ 5317907.18511913],
[ 5566774.86420072],
[ 9531919.52070137],
[ 5282354.65953605],
[ 6031221.63358688],
[ 6711598.90444251],
[10799390.1423783 ],
[10890058.71444166],
[11903292.93783269],
[11982545.59089026],
[ 6676046.37885942],
[ 5460117.28745147],
[ 6889361.53235793],
[ 5282354.65953605],
[ 5282354.65953605],
[ 7166638.5154395 ],
[ 7546581.06549082],
[ 7531570.9296595 ],
[ 5708984.96653305],
[ 6711598.90444251],
[ 7638228.50640875],
[ 6889361.53235793],
[ 7602675.98082567],
[ 5708984.96653305],
[ 7638228.50640875],
[ 9911091.94868223],
[17719081.45029487],
[ 7353808.30174408],
[ 6640493.85327634],
[ 9103025.2768983 ],
[ 7351544.36493802],
[ 6640493.85327634],
[ 6889361.53235793],
[ 9929018.99046302],
[ 9103025.2768983 ],
[ 8668247.36800988],
[10411091.24365147],
[ 6604941.32769325],
[11991672.06163624],
[12048771.3572794 ],
[ 7031571.63469026],
[ 7839638.30647419],
[ 7673781.03199184],
[ 5317907.18511913],
[ 5744537.49211614],
[ 9654076.66770923],
[ 5282354.65953605],
[ 9496856.42954557],
[ 6339289.01040156],
[ 5282354.65953605],
[ 8061590.49629139],
[ 6171167.79911315],
[ 5424564.76186838],
[ 4960117.99248222],
[ 6031221.63358688],
[ 5317907.18511913],
[13915546.88230762],
[ 6815641.13322079],
[ 8363284.77359345],
[ 8996857.13457633],
[ 8747989.45549474],
[ 6161526.38248614],
[ 7496018.40407642],
[ 7673781.03199184],
[12110234.99181856],
[ 5460117.28745147],
[ 7709333.55757492],
[ 7067124.16027335],
[12098493.17310162],
[ 8141332.58377625],
[ 7493754.46727035],
[ 5353459.71070221],
[ 5389012.2362853 ],
[11567820.63732633],
[ 5424564.76186838],
[ 5922300.12003156],
[ 7262650.29525344],
[ 7325118.31010091],
[ 6117336.82058436],
[ 8454932.21451138],
[ 6315641.83825155],
[ 5317907.18511913],
[11231063.26886847],
[ 7875190.83205727],
[ 7318255.776161 ],
[ 9041046.69647811],
[ 6747151.43002559],
[13021854.45795349],
[ 8996857.13457633],
[ 9103025.2768983 ],
[ 5317907.18511913],
[ 7638228.50640875],
[ 7353808.30174408],
[ 5708984.96653305],
[ 8668247.36800988],
[ 5353459.71070221],
[ 6811113.25960866],
[ 5353459.71070221],
[ 8961304.60899325],
[13095574.85709063],
[ 5353459.71070221],
[ 5957852.64561464],
[ 5424564.76186838],
[18117653.8644568 ],
[ 6640493.85327634],
[ 5424564.76186838],
[ 6676046.37885942],
[ 5815642.54328231],
[ 7262650.29525344],
[ 6640493.85327634],
[ 8280437.90371033],
[11231063.26886847],
[ 5282354.65953605],
[ 9334103.93746147],
[ 9135962.45451041],
[ 7460465.87849334],
[ 6640493.85327634],
[ 5531222.33861763],
[ 7531570.9296595 ],
[ 6102326.68475304],
[ 5246802.13395296],
[ 6782703.95560867],
[ 9733563.57900464],
[ 5424564.76186838],
[ 6330651.97408287],
[ 5353459.71070221],
[ 7031571.63469026],
[ 5637879.91536689],
[11226020.44937531],
[ 7582133.59107391],
[10625130.08981509],
[ 6889361.53235793],
[ 6782703.95560867],
[13291615.93795174],
[ 7274555.6486865 ],
[20597622.89543227],
[ 6359831.40015332],
[ 7600412.04401961],
[ 5531222.33861763],
[ 6517051.63831698],
[ 7244886.78818876],
[ 9031920.22573213],
[ 9254361.84997661],
[ 5389012.2362853 ],
[ 5282354.65953605],
[ 5246802.13395296],
[ 5317907.18511913],
[ 5637879.91536689],
[ 7280439.31377185],
[ 6960466.58352409],
[12442113.07549939],
[ 5353459.71070221],
[ 7280439.31377185],
[ 5424564.76186838],
[ 6640493.85327634],
[ 7638228.50640875],
[ 6711598.90444251],
[ 6223994.39733362],
[ 7324628.87567363],
[10082970.9115123 ],
[11678561.8653284 ],
[ 5646516.95168558],
[11781625.22525211],
[ 5353459.71070221],
[ 8176885.10935933],
[ 9361019.42672586],
[ 6028957.69678081],
[ 5424564.76186838],
[ 7424913.35291025],
[ 7999122.48144391],
[ 6552604.16390007],
[ 9022513.06734296],
[ 5246802.13395296],
[12101597.95549987],
[ 5353459.71070221],
[ 9103025.2768983 ],
[10538780.64457977],
[ 7575760.49156128],
[ 9582971.61654306],
[ 6351194.36383463],
[ 9547419.09095998],
[ 5993405.17119773],
[ 5317907.18511913],
[ 6398488.70813466],
[15048955.00354363],
[ 8351542.9548765 ],
[ 8659610.33169119],
[ 7424913.35291025],
[ 6315641.83825155],
[ 9041046.69647811],
[ 7389360.82732717],
[ 5246802.13395296],
[ 5353459.71070221],
[ 5317907.18511913],
[ 5389012.2362853 ],
[15317875.63794969],
[10280786.4947522 ],
[ 7111803.15660241],
[ 5602327.3897838 ],
[ 7600412.04401961],
[12953854.18918558],
[ 5424564.76186838],
[ 6604941.32769325],
[ 9334103.93746147],
[ 7496018.40407642],
[ 8747989.45549474],
[ 5282354.65953605],
[ 9183256.79881044],
[10568311.48181513],
[10861368.72279849],
[ 7496018.40407642],
[ 7111803.15660241],
[ 8561589.79126063],
[14462351.08714961],
[ 7839638.30647419],
[ 9334103.93746147],
[ 7699926.39918576],
[ 7496018.40407642],
[ 5211249.60836988],
[11421057.14992812],
[ 5460117.28745147],
[10494591.08267799],
[ 5353459.71070221],
[ 8996857.13457633],
[ 8339637.60144343],
[ 5708984.96653305],
[ 6410394.06156773],
[ 6493404.46616697],
[ 7744886.08315801],
[ 5531222.33861763],
[ 7531570.9296595 ],
[ 7460465.87849334],
[ 6889361.53235793],
[ 5317907.18511913],
[ 8161874.97352801],
[ 7887096.18549034],
[ 7531570.9296595 ],
[ 6960466.58352409],
[ 9289914.3755597 ],
[ 7709333.55757492],
[ 6173431.73591921],
[ 7280439.31377185],
[ 7460465.87849334],
[ 5753174.52843483],
[ 6925403.49236829],
[ 9334103.93746147],
[ 9538782.05464128],
[14133271.58677011],
[12387792.66254332],
[ 7661875.67855877],
[ 9103514.71132558],
[ 5282354.65953605],
[12154424.55372034],
[ 5317907.18511913],
[ 6747151.43002559],
[ 5531222.33861763],
[ 5282354.65953605],
[ 8454932.21451138],
[ 8363284.77359345],
[ 9887444.77653221],
[ 5317907.18511913],
[ 7709333.55757492],
[ 6676046.37885942],
[ 9662224.26960064],
[ 7602675.98082567],
[21706894.41009097],
[ 6386746.88941772],
[ 7460465.87849334],
[ 7344401.14335492],
[ 8834104.64249223],
[ 8161874.97352801],
[ 8541047.40150887],
[ 5780090.01769922],
[ 8668247.36800988],
[ 5566774.86420072],
[ 9281277.339241 ],
[ 5995669.10800379],
[ 5353459.71070221],
[ 9334103.93746147],
[ 6031221.63358688],
[ 6066774.15916996],
[ 8810457.47034222],
[ 8925752.08341016],
[11678561.8653284 ],
[ 7999122.48144391],
[ 7531570.9296595 ],
[ 8339637.60144343],
[ 5708984.96653305],
[ 6173431.73591921],
[ 6604941.32769325],
[ 8349279.01807044],
[ 8461305.31402401],
[ 5637879.91536689],
[ 7839638.30647419],
[ 5531222.33861763],
[ 5246802.13395296],
[ 5282354.65953605],
[ 6640493.85327634],
[ 6676046.37885942],
[11195510.74328538],
[ 6640493.85327634],
[ 6996019.10910718],
[ 6676046.37885942],
[ 5424564.76186838],
[ 6676046.37885942],
[ 9582971.61654306],
[ 5637879.91536689],
[11654425.2587511 ],
[ 6604941.32769325],
[ 7067613.59470063],
[ 5246802.13395296],
[ 7804085.78089111],
[ 7067124.16027335],
[ 5895384.63076717],
[ 9831839.29562465],
[13241053.27653734],
[ 8552952.75494194],
[ 6604941.32769325],
[ 6711598.90444251],
[ 8454932.21451138],
[ 5939574.19266894],
[ 6853809.00677484],
[10194742.03127643],
[ 6782703.95560867],
[12952105.19826053],
[ 9745724.10862716],
[ 9567472.04628446],
[ 5246802.13395296],
[ 5637879.91536689],
[26903467.72825819],
[ 6552604.16390007],
[ 6676046.37885942],
[ 6031221.63358688],
[ 5939574.19266894],
[ 8176885.10935933],
[ 6569388.80211017],
[ 5460117.28745147],
[ 5424564.76186838],
[ 9582971.61654306],
[ 5246802.13395296],
[ 9334103.93746147],
[ 8588505.28052502],
[ 5317907.18511913],
[ 5708984.96653305],
[ 5575411.90051941],
[ 6782703.95560867],
[ 8286811.00322296],
[ 7780438.60874109],
[ 6711598.90444251],
[ 8668247.36800988],
[11275252.83077024],
[ 6640493.85327634],
[11848972.52487662],
[ 7638228.50640875],
[ 6676046.37885942],
[ 6102326.68475304],
[ 7280928.74819913],
[ 9254361.84997661],
[ 8996367.70014905],
[ 5317907.18511913],
[ 7496018.40407642],
[ 5531222.33861763],
[10390548.85389971],
[19268010.15861903],
[ 5708984.96653305],
[ 7424913.35291025],
[ 5389012.2362853 ],
[15604070.34499315],
[ 8925262.64898288],
[ 7318255.776161 ],
[ 5424564.76186838],
[ 5317907.18511913],
[ 8260384.94838585],
[ 8854157.59781671],
[ 8747989.45549474],
[ 8925752.08341016],
[ 5173433.14598073],
[ 8552952.75494194],
[ 9334103.93746147],
[ 7353808.30174408],
[ 7531570.9296595 ],
[ 5282354.65953605],
[ 6640493.85327634]])
#Y_test containing the correct spending cost/habit of tourist
Y_test
| total_cost | |
|---|---|
| 4284 | 130000.0 |
| 3401 | 8453250.0 |
| 3204 | 7293000.0 |
| 3362 | 497250.0 |
| 3822 | 6132750.0 |
| ... | ... |
| 2147 | 6298500.0 |
| 1418 | 3480750.0 |
| 3654 | 8619000.0 |
| 3137 | 600000.0 |
| 2253 | 497250.0 |
480 rows × 1 columns
#Comparing Y_test to the prediction to find the residuals
plt.scatter(Y_test, prediction/10**6)
<matplotlib.collections.PathCollection at 0x2431ae2a1f0>
#Plotting histogram distribution of the residuals
sns.distplot(Y_test- prediction/10**6)
<AxesSubplot:ylabel='Density'>
EVALUATING THE MODEL With Loss functions so ass to minimise the errors.
REGRESSION EVALUATION METRICS:
#Looking at the Linear evaluation metrics we get:
from sklearn import metrics
#Average error of the model
metrics.mean_absolute_error(Y_test,prediction)
7232385.834148328
#Check-mating larger errors by squaring.
metrics.mean_squared_error(Y_test,prediction)
121440898468113.73
#RMSE Interpreting the Y units prediction.
np.sqrt(metrics.mean_squared_error(Y_test,prediction))
11020022.616497379
r_squared = lm.score(X, Y)
print(r_squared)
0.09682259179087571
r_squared = lm.score(X_test,prediction)
print(r_squared)
1.0
from sklearn.metrics import r2_score
r2_score(Y_test,prediction)
0.1787447445823661
r2_score(Y_test,prediction).dtype
dtype('float64')
Our model reads r2_score of 18% approx performance in prediction.
Using/importing another dataset to test our model performance,
from sklearn.svm import SVC
tz = pd.read_csv('Test .csv')
tz
| ID | country | age_group | travel_with | total_female | total_male | purpose | main_activity | info_source | tour_arrangement | ... | package_food | package_transport_tz | package_sightseeing | package_guided_tour | package_insurance | night_mainland | night_zanzibar | payment_mode | first_trip_tz | most_impressing | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | tour_1 | AUSTRALIA | 45-64 | Spouse | 1.0 | 1.0 | Leisure and Holidays | Wildlife tourism | Travel, agent, tour operator | Package Tour | ... | Yes | Yes | Yes | Yes | Yes | 10 | 3 | Cash | Yes | Wildlife |
| 1 | tour_100 | SOUTH AFRICA | 25-44 | Friends/Relatives | 0.0 | 4.0 | Business | Wildlife tourism | Tanzania Mission Abroad | Package Tour | ... | No | No | No | No | No | 13 | 0 | Cash | No | Wonderful Country, Landscape, Nature |
| 2 | tour_1001 | GERMANY | 25-44 | Friends/Relatives | 3.0 | 0.0 | Leisure and Holidays | Beach tourism | Friends, relatives | Independent | ... | No | No | No | No | No | 7 | 14 | Cash | No | No comments |
| 3 | tour_1006 | CANADA | 24-Jan | Friends/Relatives | 2.0 | 0.0 | Leisure and Holidays | Cultural tourism | others | Independent | ... | No | No | No | No | No | 0 | 4 | Cash | Yes | Friendly People |
| 4 | tour_1009 | UNITED KINGDOM | 45-64 | Friends/Relatives | 2.0 | 2.0 | Leisure and Holidays | Wildlife tourism | Friends, relatives | Package Tour | ... | Yes | Yes | No | No | No | 10 | 0 | Cash | Yes | Friendly People |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 1596 | tour_988 | UNITED STATES OF AMERICA | 25-44 | NaN | 0.0 | 1.0 | Meetings and Conference | Mountain climbing | Newspaper, magazines,brochures | Independent | ... | No | No | No | No | No | 1 | 0 | Cash | No | NaN |
| 1597 | tour_990 | ITALY | 45-64 | Spouse and Children | 3.0 | 1.0 | Leisure and Holidays | Wildlife tourism | Friends, relatives | Package Tour | ... | Yes | Yes | Yes | No | No | 10 | 5 | Other | Yes | Wildlife |
| 1598 | tour_992 | FINLAND | 25-44 | Alone | 0.0 | 1.0 | Meetings and Conference | Mountain climbing | Friends, relatives | Independent | ... | No | No | No | No | No | 6 | 0 | Cash | Yes | No comments |
| 1599 | tour_996 | SOUTH AFRICA | 24-Jan | Alone | 0.0 | 1.0 | Business | Beach tourism | Friends, relatives | Independent | ... | No | No | No | No | No | 4 | 0 | Cash | Yes | Wildlife |
| 1600 | tour_998 | SOUTH AFRICA | 25-44 | Spouse | 1.0 | 1.0 | Leisure and Holidays | Cultural tourism | Radio, TV, Web | Independent | ... | No | No | No | No | No | 9 | 5 | Cash | Yes | Friendly People |
1601 rows × 22 columns
tz.corr()
| total_female | total_male | night_mainland | night_zanzibar | |
|---|---|---|---|---|
| total_female | 1.000000 | 0.288933 | 0.015265 | 0.078020 |
| total_male | 0.288933 | 1.000000 | -0.035880 | 0.020622 |
| night_mainland | 0.015265 | -0.035880 | 1.000000 | 0.516262 |
| night_zanzibar | 0.078020 | 0.020622 | 0.516262 | 1.000000 |
sns.heatmap(tz.corr(), annot=True)
<AxesSubplot:>
lm.fit(X,Y)
LinearRegression()
Y_pred = lm.predict(X_test)
Y_pred
array([[ 5082921.44689458],
[ 5356361.05250342],
[ 9505993.91042197],
[ 5458900.90460673],
[ 5151281.34829679],
[ 5698160.55951447],
[ 5014561.54549237],
[ 6635229.06918205],
[ 5185461.29899789],
[ 5014561.54549237],
[ 5851962.15137201],
[ 9745253.56532971],
[12918546.98377004],
[ 5048741.49619347],
[ 7335909.87225727],
[10563018.70630403],
[ 5219641.249699 ],
[ 6261982.81835893],
[ 8686009.73865349],
[ 5117101.39759568],
[ 8378390.18234354],
[ 5185461.29899789],
[ 9776798.26095763],
[ 5014561.54549237],
[ 5288001.15110121],
[11129296.68918484],
[ 7438449.72436059],
[ 9505993.91042197],
[ 5475941.90404936],
[ 5493080.85530784],
[ 9044711.50368083],
[ 5886142.10207311],
[ 7711889.32996943],
[ 6908668.67479089],
[ 9198472.30592788],
[10699738.50910845],
[ 5014561.54549237],
[10426298.90349961],
[13604544.55962314],
[ 7472629.67506169],
[ 5322181.10180231],
[11366059.83044573],
[ 9505993.91042197],
[ 6398702.62116335],
[ 6566869.16777984],
[ 5117101.39759568],
[ 7335909.87225727],
[ 6601049.11848095],
[22294929.97428402],
[14011970.76114736],
[ 5151281.34829679],
[ 8056577.15789072],
[ 5151281.34829679],
[ 6566869.16777984],
[ 9181333.3546694 ],
[ 5082921.44689458],
[ 5219641.249699 ],
[11163574.5917018 ],
[ 9830752.4179904 ],
[ 5048741.49619347],
[ 5048741.49619347],
[ 6193761.65838306],
[ 7011208.52689421],
[ 7472629.67506169],
[ 6703588.97058426],
[12325497.01190767],
[ 8771271.89807199],
[ 7404269.77365948],
[11778617.80068999],
[ 7438449.72436059],
[12989362.6092086 ],
[16501681.53816046],
[ 7643529.42856722],
[ 8446750.08374575],
[ 6977028.5761931 ],
[10360394.72613375],
[ 6655199.17914541],
[14610209.66393512],
[ 8907991.70087641],
[ 6005780.11582442],
[ 9916153.31883524],
[ 7609349.47786611],
[10545879.75504555],
[ 5048741.49619347],
[ 6635229.06918205],
[ 7472629.67506169],
[ 5014561.54549237],
[ 5185461.29899789],
[ 5014561.54549237],
[ 9181333.3546694 ],
[ 5219641.249699 ],
[21186545.4868319 ],
[ 7595123.2645335 ],
[ 9420552.21996665],
[ 5219641.249699 ],
[12152141.53436579],
[ 9793978.0018266 ],
[ 5082921.44689458],
[ 5322181.10180231],
[ 9657258.19902218],
[ 5048741.49619347],
[ 5783602.2499698 ],
[ 6669409.01988316],
[11129296.68918484],
[11209565.82130965],
[12237501.64560014],
[12388627.19277401],
[ 6635229.06918205],
[ 5219641.249699 ],
[ 6840308.77338868],
[ 5048741.49619347],
[ 5048741.49619347],
[ 6860278.88335204],
[ 7557989.78629604],
[ 7472629.67506169],
[ 5458900.90460673],
[ 6669409.01988316],
[ 7575169.52716501],
[ 6840308.77338868],
[ 7540989.5764639 ],
[ 5458900.90460673],
[ 7575169.52716501],
[10135638.76741131],
[18164793.83984446],
[ 7301729.92155617],
[ 6601049.11848095],
[ 9230017.0015558 ],
[ 7284648.13250305],
[ 6601049.11848095],
[ 6840308.77338868],
[ 9865152.68933882],
[ 9230017.0015558 ],
[ 8737091.94737088],
[10631239.8662799 ],
[ 6566869.16777984],
[12408303.44728982],
[12220599.38758385],
[ 6977028.5761931 ],
[ 7882650.34204862],
[ 7609349.47786611],
[ 5082921.44689458],
[ 5493080.85530784],
[ 9813613.46673192],
[ 5048741.49619347],
[ 9591533.55269315],
[ 6193622.91695672],
[ 5048741.49619347],
[ 8138950.99639898],
[ 5903240.2637211 ],
[ 5185461.29899789],
[ 4724040.15083041],
[ 5783602.2499698 ],
[ 5082921.44689458],
[14319729.05888365],
[ 6552642.95444723],
[ 8514832.50229529],
[ 9095932.45382456],
[ 8856672.79891683],
[ 6022723.16345119],
[ 7438449.72436059],
[ 7609349.47786611],
[12408442.18871615],
[ 5219641.249699 ],
[ 7643529.42856722],
[ 7011208.52689421],
[12169460.0166611 ],
[ 8258531.84794492],
[ 7421367.93530747],
[ 5117101.39759568],
[ 5151281.34829679],
[11876066.67368377],
[ 5185461.29899789],
[ 5663980.60881336],
[ 7253005.48505928],
[ 7270144.43631776],
[ 5937322.26260636],
[ 8532012.24316425],
[ 6057041.85557864],
[ 5082921.44689458],
[11434460.52145843],
[ 7916830.29274972],
[ 7267549.97085506],
[ 9181333.3546694 ],
[ 6703588.97058426],
[13228842.58476366],
[ 9095932.45382456],
[ 9230017.0015558 ],
[ 5082921.44689458],
[ 7575169.52716501],
[ 7301729.92155617],
[ 5458900.90460673],
[ 8737091.94737088],
[ 5117101.39759568],
[ 6518479.37634099],
[ 5117101.39759568],
[ 9061752.50312346],
[13516508.40370511],
[ 5117101.39759568],
[ 5698160.55951447],
[ 5185461.29899789],
[18996466.92168823],
[ 6601049.11848095],
[ 5185461.29899789],
[ 6635229.06918205],
[ 5561440.75671005],
[ 7253005.48505928],
[ 6601049.11848095],
[ 8207490.42883802],
[11434460.52145843],
[ 5048741.49619347],
[ 9505993.91042197],
[ 9044891.03471766],
[ 7404269.77365948],
[ 6601049.11848095],
[ 5288001.15110121],
[ 7472629.67506169],
[ 5851962.15137201],
[ 5014561.54549237],
[ 6737768.92128537],
[ 9574704.82947709],
[ 5185461.29899789],
[ 6142401.96681299],
[ 5117101.39759568],
[ 6977028.5761931 ],
[ 5390541.00320452],
[11539456.09759811],
[ 7592169.73699715],
[10414740.69042992],
[ 6840308.77338868],
[ 6737768.92128537],
[13619146.20762428],
[ 7150604.3743823 ],
[21386493.4223185 ],
[ 6142442.75642348],
[ 7523907.78741079],
[ 5288001.15110121],
[ 6364522.67046225],
[ 7182108.28039974],
[ 9161657.10015359],
[ 9386413.05887603],
[ 5151281.34829679],
[ 5048741.49619347],
[ 5014561.54549237],
[ 5082921.44689458],
[ 5390541.00320452],
[ 7216288.23110084],
[ 6908668.67479089],
[12613660.63434912],
[ 5117101.39759568],
[ 7216288.23110084],
[ 5185461.29899789],
[ 6601049.11848095],
[ 7575169.52716501],
[ 6669409.01988316],
[ 6039862.11470967],
[ 7301689.13194568],
[10240854.6641983 ],
[12103278.35644257],
[ 5441761.95334826],
[12049601.68226248],
[ 5117101.39759568],
[ 8292711.79864603],
[ 9488952.91097935],
[ 5766520.46091668],
[ 5185461.29899789],
[ 7370089.82295838],
[ 8121812.0451405 ],
[ 6398702.62116335],
[ 8891105.81545499],
[ 5014561.54549237],
[12357221.23857242],
[ 5117101.39759568],
[ 9230017.0015558 ],
[10651054.86222205],
[ 7558030.57590653],
[ 9745253.56532971],
[ 6091221.80627974],
[ 9711073.61462861],
[ 5732340.51021557],
[ 5082921.44689458],
[ 6364383.92903591],
[15737813.39201355],
[ 8275850.33024023],
[ 8685870.99722715],
[ 7370089.82295838],
[ 6057041.85557864],
[ 9181333.3546694 ],
[ 7335909.87225727],
[ 5014561.54549237],
[ 5117101.39759568],
[ 5082921.44689458],
[ 5151281.34829679],
[15957437.58201596],
[10392118.9527985 ],
[ 7065064.73211113],
[ 5356361.05250342],
[ 7523907.78741079],
[13348243.90527277],
[ 5185461.29899789],
[ 6566869.16777984],
[ 9505993.91042197],
[ 7438449.72436059],
[ 8856672.79891683],
[ 5048741.49619347],
[ 9318053.15747382],
[10853319.78031867],
[11177980.33607124],
[ 7438449.72436059],
[ 7065064.73211113],
[ 8634552.09526757],
[15120036.97613632],
[ 7882650.34204862],
[ 9505993.91042197],
[ 7372978.14386862],
[ 7438449.72436059],
[ 4980381.59479126],
[11812797.7513911 ],
[ 5219641.249699 ],
[10565653.96137721],
[ 5117101.39759568],
[ 9095932.45382456],
[ 8378251.44091721],
[ 5458900.90460673],
[ 6261982.81835893],
[ 6227941.60908416],
[ 7677709.37926832],
[ 5288001.15110121],
[ 7472629.67506169],
[ 7404269.77365948],
[ 6840308.77338868],
[ 5082921.44689458],
[ 8207351.68741168],
[ 7814429.18207274],
[ 7472629.67506169],
[ 6908668.67479089],
[ 9420593.00957714],
[ 7643529.42856722],
[ 5920322.05277422],
[ 7216288.23110084],
[ 7404269.77365948],
[ 5544301.80545157],
[ 6842944.02846187],
[ 9505993.91042197],
[ 9659852.66448487],
[14959319.18084988],
[12679287.3288623 ],
[ 7711750.58854309],
[ 9198472.30592788],
[ 5048741.49619347],
[12493843.08956099],
[ 5082921.44689458],
[ 6703588.97058426],
[ 5288001.15110121],
[ 5048741.49619347],
[ 8532012.24316425],
[ 8514832.50229529],
[ 9999057.70603323],
[ 5082921.44689458],
[ 7643529.42856722],
[ 6635229.06918205],
[ 9896379.11250358],
[ 7540989.5764639 ],
[22699541.38969824],
[ 6125401.75698085],
[ 7404269.77365948],
[ 7031178.63685757],
[ 9010392.81155339],
[ 8207351.68741168],
[ 8685732.25580081],
[ 5527260.80600894],
[ 8737091.94737088],
[ 5322181.10180231],
[ 9369372.0594334 ],
[ 5749422.29926869],
[ 5117101.39759568],
[ 9505993.91042197],
[ 5783602.2499698 ],
[ 5817782.2006709 ],
[ 8873811.7501753 ],
[ 9027572.55242235],
[12103278.35644257],
[ 8121812.0451405 ],
[ 7472629.67506169],
[ 8378251.44091721],
[ 5458900.90460673],
[ 5920322.05277422],
[ 6566869.16777984],
[ 8258768.54118711],
[ 8566151.40425487],
[ 5390541.00320452],
[ 7882650.34204862],
[ 5288001.15110121],
[ 5014561.54549237],
[ 5048741.49619347],
[ 6601049.11848095],
[ 6635229.06918205],
[11400280.57075733],
[ 6601049.11848095],
[ 6942848.625492 ],
[ 6635229.06918205],
[ 5185461.29899789],
[ 6635229.06918205],
[ 9745253.56532971],
[ 5390541.00320452],
[11998241.99069241],
[ 6566869.16777984],
[ 6979663.83126629],
[ 5014561.54549237],
[ 7848470.39134751],
[ 7011208.52689421],
[ 5681021.60825599],
[ 9984513.22023745],
[13499606.14568882],
[ 8583331.14512384],
[ 6566869.16777984],
[ 6669409.01988316],
[ 8532012.24316425],
[ 5766422.50910083],
[ 6806128.82268758],
[10189748.03846529],
[ 6737768.92128537],
[13192002.96197375],
[ 9830793.20760089],
[ 9691438.14972328],
[ 5014561.54549237],
[ 5390541.00320452],
[28504021.7220076 ],
[ 6398702.62116335],
[ 6635229.06918205],
[ 5783602.2499698 ],
[ 5766422.50910083],
[ 8292711.79864603],
[ 6532689.21707874],
[ 5219641.249699 ],
[ 5185461.29899789],
[ 9745253.56532971],
[ 5014561.54549237],
[ 9505993.91042197],
[ 8617511.09582494],
[ 5082921.44689458],
[ 5458900.90460673],
[ 5373402.05194605],
[ 6737768.92128537],
[ 8241629.58992863],
[ 7711889.32996943],
[ 6669409.01988316],
[ 8737091.94737088],
[11519861.42230327],
[ 6601049.11848095],
[12303128.34011332],
[ 7575169.52716501],
[ 6635229.06918205],
[ 5851962.15137201],
[ 7184743.53547292],
[ 9386413.05887603],
[ 9127477.14945249],
[ 5082921.44689458],
[ 7438449.72436059],
[ 5288001.15110121],
[10682420.02681314],
[20207154.56800148],
[ 5458900.90460673],
[ 7370089.82295838],
[ 5151281.34829679],
[16279503.67230584],
[ 9059117.24805027],
[ 7267549.97085506],
[ 5185461.29899789],
[ 5082921.44689458],
[ 8227125.89374334],
[ 8990757.34664806],
[ 8856672.79891683],
[ 9027572.55242235],
[ 4929119.85503704],
[ 8583331.14512384],
[ 9505993.91042197],
[ 7301729.92155617],
[ 7472629.67506169],
[ 5048741.49619347],
[ 6601049.11848095]])
#Evaluate accuracy
from sklearn.metrics import accuracy_score
accuracy = lm.score(X_test,Y_pred)
accuracy
1.0
Interestingly, after testing our model with another datset(Test .csv) looking at the data correlations, our model was able to predict(Y_pred) the spending rate of the tourist considering the data is missing the total_cost column or feature. Our model achieve accuracy of 1.0(100%) prediction of spending habit of tourist in the tz data i.e Test .csv file.
X_test
| total_female | total_male | night_mainland | night_zanzibar | |
|---|---|---|---|---|
| 4284 | 0.0 | 1.0 | 3.0 | 0.0 |
| 3401 | 0.0 | 1.0 | 11.0 | 0.0 |
| 3204 | 1.0 | 1.0 | 0.0 | 7.0 |
| 3362 | 0.0 | 1.0 | 14.0 | 0.0 |
| 3822 | 0.0 | 1.0 | 5.0 | 0.0 |
| ... | ... | ... | ... | ... |
| 2147 | 1.0 | 1.0 | 0.0 | 7.0 |
| 1418 | 1.0 | 1.0 | 2.0 | 0.0 |
| 3654 | 1.0 | 1.0 | 7.0 | 0.0 |
| 3137 | 0.0 | 1.0 | 2.0 | 0.0 |
| 2253 | 1.0 | 0.0 | 3.0 | 0.0 |
480 rows × 4 columns
Y_pred.sum()
3773139286.330149
Besides observation listed in the markdown text, The need for the tourism sector to pay more attention to tourism activities not attracting tourist attention such as
Secondly, its been observed that most tourist prefer payment via direct cash. This is risky considering the currency exchange rate of Tanzanian shillings to Europe, USA, etc. making tourist to move about with large sum of money. It will be preferable to encourage an electronic payment system which has more value return on investment. Lastly, non packaged food is mostly preferred by tourist due to its flexibility, sweetness and freshness. So a more priority to be channel towards its expansion.
NOTE: The model chosen for this project didn't really fit into the dataset given, producing a not too regular regression pattern which may affect the model durability & performance in future. Using other regression model such as Three distribution, etc will be really helpful in future development.